A joint translation model with integrated reordering
نویسنده
چکیده
This dissertation aims at combining the benefits and to remedy the flaws of the two popular frameworks in statistical machine translation, namely Phrasebased MT and N-gram-based MT. Phrase-based MT advanced the state-of-the art towards translating phrases3 than words. By memorizing phrases, phrasal MT, is able to learn local reorderings, and handling of other local dependencies such as insertions, deletions etc. Inter-phrasal reorderings are handled through the lexicalized reordering model, which remains the state-of-the-art model for reordering in phrase-based SMT till date. However, phrase-based MT has some drawbacks: • Dependencies across phrases are not directly represented in the translation model • Discontinuous phrases cannot be represented and used • The reordering model is not designed to handle long range reorderings • Search and modeling problems require the use of a hard reordering limit • The presence of many different equivalent segmentations increases the search space 3Note that phrases in this thesis mean sequence of words, which may not be linguistic phrases
منابع مشابه
A Joint Sequence Translation Model with Integrated Reordering
We present a novel machine translation model which models translation by a linear sequence of operations. In contrast to the “N-gram” model, this sequence includes not only translation but also reordering operations. Key ideas of our model are (i) a new reordering approach which better restricts the position to which a word or phrase can be moved, and is able to handle short and long distance r...
متن کاملWord Reordering in Statistical Machine Translation with a POS-Based Distortion Model
In this paper we describe a word reordering strategy for statistical machine translation that reorders the source side based on Part of Speech (POS) information. Reordering rules are learned from the word aligned corpus. Reordering is integrated into the decoding process by constructing a lattice, which contains all word reorderings according to the reordering rules. Probabilities are assigned ...
متن کاملModeling the Translation of Predicate-Argument Structure for SMT
Predicate-argument structure contains rich semantic information of which statistical machine translation hasn’t taken full advantage. In this paper, we propose two discriminative, feature-based models to exploit predicateargument structures for statistical machine translation: 1) a predicate translation model and 2) an argument reordering model. The predicate translation model explores lexical ...
متن کاملDynamic distortion in a discriminative reordering model for statistical machine translation
Most phrase-based statistical machine translation systems use a so-called distortion limit to keep the size of the search space manageable. In addition, a distance-based distortion penalty is used as a feature to keep the decoder to translate monotonically unless there is sufficient support for a jump from other features, particularly the language models. To overcome the issue of setting the op...
متن کاملReordering by Parsing
We present a new discriminative reordering model for statistical machine translation. The model employs a standard data-driven dependency parser to predict reorderings based on syntactic information. This is made possible through the introduction of a reordering structure, which is a word alignment structure where the target word order is transposed onto the source sentence as a path. The appro...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012